## AMENDMENTS TO THE CLAIMS

Please amend claims 30-32 as indicated below. Please add new claims 33-37.

Claims 1-9 (Cancelled)

## 1-9. Cancelled

10. (Previously Added) A processor comprising:

a memory to store a first packed data operand having a first plurality of data elements and a second packed data operand having a second plurality of data elements;

a partial-width packed data instruction to indicate the first packed data operand and the second packed data operand and to indicate a first operation to be performed on a subset of corresponding pairs of data elements of the first and the second packed data operands;

a decoder coupled with the memory to receive the partial-width packed data instruction and to decode the partial-width packed data instruction; and

a partial-width execution unit coupled with the decoder to execute the operation on the subset of corresponding pairs of data elements.

(Previously Added) The processor of claim 10:

wherein the decoder is a decoder to convert the partial-width packed data instruction into a first micro instruction that corresponds to a first subset of at least one corresponding pair of data elements of the first and the second packed data operands and a second micro instruction that corresponds to a second subset of at least one corresponding pair of data elements of the first and the second packed data operands; and

2 WD 11

82

wherein the partial-width execution unit is a partial-width execution unit to execute an operation specified by the first micro instruction on the first subset.

- 12. (Previously Added) The processor of claim 11, further comprising a port to receive at least one data element of the first subset and to not receive a data element of the second subset.
- 13. (Previously Added) The processor of claim 11:

wherein the processor is a processor to eliminate the second micro instruction; and wherein the processor is a processor to set at least one result data element corresponding to the second subset to a predetermined value.

14. (Previously Added) The processor of claim 11:

further comprising delay circuitry to delay execution of operations on the second subset; and

wherein the partial-width execution unit is a partial-width execution unit to execute an operation specified by the second partial-width micro instruction on the second subset after the delay.

15. (Previously Added) The processor of claim 10:

further comprising a first port coupled with the memory to receive the first packed data operand and a second port coupled with the memory to substantially simultaneously receive the second packed data operand;

further comprising divide circuity to divide the first packed data operand into a first subset comprising at least one data element and a second subset comprising at least one

Attorney Docket No.: 42P5193C Application No.: 09/852,217

B2

data element and to divide the second packed data operand into a third subset comprising at least one data element and a fourth subset comprising at least one data element; and wherein the partial-width execution unit is a partial-width execution unit to perform the first operation on at least one corresponding pair of data elements of the first and the third subsets to generate at least one resulting data element.

16. (Previously Added) The processor of claim 15:

further comprising delay circuitry to delay the second subset and to delay the fourth subset; and

wherein after the delay, the partial-width execution unit is a partial-width execution unit to perform the first operation on at least one corresponding pair of data elements of the second and the fourth subsets to generate at least one additional resulting data element.

- 17. (Previously Added) The processor of claim 15, wherein the partial-width execution unit is a partial-width execution unit to generate at least one additional resulting data element corresponding to the second and the fourth subsets by setting the at least one additional resulting data element to a predetermined value.
- 18. (Previously Added) The processor of claim 15, wherein the partial-width execution unit is a partial-width execution unit to execute the packed data instruction and a second similar packed data instruction on a half clock cycle.
- 19. (Previously Added) The processor of claim 15, wherein the partial-width execution unit is a 64-bit partial-width execution unit, and wherein the first and the second packed data operands are 128-bit operands.

5

20. (Previously Added) A method comprising:

Attorney Docket No.: 42P5193C Application No.: 09/852,217



32

\*\*\*

Cuth,

receiving a packed data instruction that specifies memory locations of a first full-width packed data operand having a plurality of data elements and a second full-width packed data operand having a corresponding plurality of data elements;

substantially simultaneously accessing the first full-width packed data operand and the second full-width packed data operand from the memory locations;

dividing the first full-width packed data operand into a first subset of data elements and a second subset of data elements and dividing the second full-width packed data operand into a third subset of data elements and a fourth subset of data elements;

performing an operation specified by the packed data instruction on the first and third subsets of data elements to generate a first resulting one or more data elements;

delaying the second and fourth subsets of data elements;

after said delaying, performing an operation specified by the packed data instruction on the second and the fourth subsets of data elements to generate a second resulting one or more data elements; and

storing the first and the second resulting data elements in a common packed data operand.

- 21. (Previously Added) The method of claim 20, wherein performing an operation specified by the macro instruction on the second and fourth subsets comprises setting a data element to a predetermined value.
- 22. (Previously Added) The method of claim 20, wherein dividing includes dividing a 128-bit packed data operand into a 64-bit segment of two low order data elements and a 64-bit segment of two high order data elements.
- 23. (Previously Added) A processor comprising:

a packed data instruction to specify an operation on a plurality of data elements of at least one packed data operand;

a decoder to generate a first micro instruction and a second micro instruction corresponding to the packed data instruction, the first micro instruction specifying a first operation and the second micro instruction specifying a second operation;

an execution unit to execute an operation specified by the first micro instruction on only a subset of the plurality of packed data elements; and

circuitry to eliminate the second micro instruction.

- 24. (Previously Added) The processor of claim 23, wherein the decoder is a decoder to create the second micro instruction by replicating the first micro instruction to create a replica and modifying the replica to create the second micro instruction.
- 25. (Previously Added) The processor of claim 23, wherein the execution unit is an execution unit to set a data element in a result packed data operand to a predetermined value.
- 26. (Previously Added) A method comprising:

receiving a packed data instruction specifying memory locations of a first packed data operand and a second packed data operand;

converting the packed data instruction into a first packed data micro instruction and a second packed data micro instruction;

executing the first packed data micro instruction including accessing only a subset of data elements of the first and the second packed data operands comprising at least one pair of corresponding data elements from the first and the second packed data operands and

Attorney Docket No.: 42P5193C Application No.: 09/852,217

B2

causing an operation specified by the packed data instruction to be performed on the subset to produce a resulting one or more data elements; and

executing the second packed data micro instruction including accessing only a subset of data elements of the first and the second packed data operands comprising at least one pair of corresponding data elements from the first and the second packed data operands and causing an operation specified by the packed data instruction to be separately performed on the subset to produce a resulting one or more additional data elements.

- (Previously Added) The method of claim 26, wherein executing the second packed data micro instruction includes setting a data element of the one or more additional data elements to a predetermined value.
- 28. (Previously Added) The method of claim 26, further comprising:

  writing the resulting one or more data elements to a result packed data operand; and writing the resulting one or more additional data elements to the same result packed data operand.
- 29. (Previously Added) The method of claim 26, wherein executing the second packed data micro instruction is delayed relative to executing the first packed data micro instruction.
- 30. (Currently Amended) A processor comprising:

  a memory to store a first packed data operand and a second packed data operand;

  an instruction to indicate the first packed data operand and the second packed data operand and to indicate an operation to be performed on -a- the first packed data operand and the second packed data operand;

decoder means for decoding to decode the instruction; and

execution means for executing to execute the instruction.

BZ

31. (Currently Amended) The processor of claim 30, wherein the decoder means is a decoder means for decoding to decode the instruction into a first micro instruction that specifies an operation on only a portion of the first and the second packed data operands and a second micro instruction that specifies an operation on only a different portion of the first and the second packed data operands.

ling.

- 32. (Currently Amended) The processor of claim 30, wherein the execution means is an execution means for performing to perform operations specified by the instruction on a first subset of corresponding pairs of data elements of the first and the second packed data operands and after a delay to perform operations specified by the instruction means on a second subset of corresponding pairs of data elements of the first and the second packed data operands.
- 33. (New) A computer system comprising:

a bus;

a storage device including a flash memory coupled to the bus to store data;

a processor coupled to the storage device by the bus to execute instructions;

a memory of the processor to store a first packed data operand having a first plurality of data elements and a second packed data operand having a second plurality of data elements;

a decoder of the processor coupled with the memory of the processor, the decoder to receive a partial-width packed data instruction and to decode the partial-width packed data instruction, wherein the partial width packed data instruction indicates the first packed data operand and the second packed data operand, and indicates a first operation

B-2

to be performed on a subset of corresponding pairs of data elements of the first and the second packed data operands; and

a partial-width execution unit of the processor coupled with the decoder to execute the operation on the subset of corresponding pairs of data elements.

34. (New) The computer system of claim 33:

wherein the decoder is a decoder to convert the partial-width packed data instruction into a first micro instruction that corresponds to a first subset of at least one corresponding pair of data elements of the first and the second packed data operands and a second micro instruction that corresponds to a second subset of at least one corresponding pair of data elements of the first and the second packed data operands; and

wherein the partial-width execution unit is a partial-width execution unit to execute an operation specified by the first micro instruction on the first subset.

35. (New) The computer system of claim 34:

wherein the processor is a processor to eliminate the second micro instruction; and wherein the processor is a processor to set at least one result data element corresponding to the second subset to a predetermined value.

36. (New) The computer system φf claim 33:

further comprising a first port coupled with the memory to receive the first packed data operand and a second port coupled with the memory to substantially simultaneously receive the second packed data operand;

further comprising divide circultry to divide the first packed data operand into a first subset comprising at least one data element and a second subset comprising at least one

By

Culf :

data element and to divide the second packed data operand into a third subset comprising at least one data element and a fourth subset comprising at least one data element; and wherein the partial-width execution unit is a partial-width execution unit to perform the first operation on at least one corresponding pair of data elements of the first and the third subsets to generate at least one resulting data element.

(New) The computer system of claim 36:

further comprising delay circuitry to delay the second subset and to delay the fourth subset; and

wherein after the delay, the partial-width execution unit is a partial-width execution unit to perform the first operation on at least one corresponding pair of data elements of the second and the fourth subsets to generate at least one additional resulting data element.